LC Analysis report - Introduction & Research Questions

Our overall aim is to study what KIND of learning occurs in NF, so that we can examine in more detail the relationship between patient’s clinical activity and treatment outcome. This suggests two foci: (A) developing modelling methods to perform a more thorough examination of learning than any previously; (B) applying the developed models of learning to our clinical trial data to tease out the mechanisms of different protocols and modes of training.

Prior work seems to have focused on overall performance improvement, mainly so that subjects can be split into groups of learners and non-learners, to estimate the efficacy of treatment for learners only. This is a neat trick when studying NF efficacy, but not very useful to study learning itself.

In order to develop methods to examine learning, we must study performance at trial, session, and treatment level; and measure performance both in terms of magnitude and the pattern of change. To achieve this, we derive data that robustly captures (1) magnitude and (2) change for all normal and transfer trials (details below). Normal and transfer are joined in order to minimise difference in number of trials per session across treatment (which changes because inverse trials are introduced halfway).

Thus, in part A, our data is within-subjects ‘learning curves’ (LCs), defined as: performance (1) magnitude and (2) change computed from trials within each session, for all sessions.

In part B the data can be subdivided into separate protocols and training modes, to explore the details of how they differ in terms of the best-fit learning model(s) (para and non-para). The split between A and B helps distinguish task-learning from task-outcomes, even when such outcomes are measured repeatedly throughout the training, such as baseline bandpowers or repeated symptom self-reports.

A. Developing modelling methods

Following the literature (Fitts, Posner), we investigate whether NF shows evidence of being skill acquisition, so going beyond operant conditioning. This would be important also clinically, because conditioning could be conceivably automated and made into, e.g. an ‘app’; but skill acquisition requires coaching.

To investigate learning, we must deal with the complexity of analysing learning: there are different qualities of learning captured by different sorts of analysis, e.g.

  • magnitude of change in overall performance (increase from beginning to end)
  • consistency of performance improvement (monotonicity within sessions and trials)
  • shape of performance curve:
    • linear
    • power law (or piecewise power law)
    • exponential or sigmoidal
  • plateau point of performance improvement across sessions

The results in draft paper “Learning Curves_v02.docx” show quantification for magnitude and plateau point, where session-wise scores were fitted with linear (growth curve) and quadratic models. These two LC concepts are the main focus of prior work on NFB learning. When studies estimate whether subjects have learned, they usually calculate gain in some way (REF). Several studies have also estimated sufficiency by looking at the number of sessions required to see a plateau in improvement (REF).

Then there is the issue of how to quantify aspects or respresentations of learning. Parametric methods include linear regression, curve fitting, and hierarchical models. These can be useful but also sensitive to violations of assumptions, which can be hard to avoid in noisy data. Non-parametric approaches can help, and we develop one based on cosine similarity between performance metrics and canonical learning curves.

A: Research Questions

First, it’s important to know about overall performance improvement for the purpose of context. If the scores of all trials in all types of training tend to increase, that tells us that learning of some kind must have happened, and we can proceed to study what kind.

  • RQ1 - overall performance magnitude: do scores grow from beginning to end of treatment?

It’s also interesting to check if learning has plateaued or not, by fitting a quadratic curve and checking the sign of the quadratic term.

  • RQ2 - plateau of performance improvement reached? Do performance scores reach a high point before the end of the training, and hold steady? Or are they still growing by the end of treatment?

As noted above, a linear of model of learning does not fit to learning theory nor provide much more information than that performance increased or decreased. Other curve families have been used to describe learning with better empirical support. Power law curves were long thought to best describe learning due to practice (Newell, Rosenbloom, 1980). However, other curve families have been argued to fit better to non-averaged (individual) curves: e.g. exponential (Heathcote et al, 1999). Further, if performance conforms to a multi-stage profile, e.g. as the three stage model for motor skill learning (Fitts, Posner), then a piecewise power law model can fit better. And the type of task-reward that the data come from can also affect, e.g. in success-only tasks, a sigmoid curve can fit the data (Leibowitz et al 2010), and notably a sigmoid arguably consists of three phases, relating to Fitts-Posner.

  • RQ3 - what kind of learning? Which family of curves fits best to the data (by parametric modelling)?

These parametric approaches have some issues: violations of their assumptions, such as outliers, can be hard to avoid in noisy data. Some fitted models are very sensitive to small changes in the data: outliers can change the linear-fit slope or the shape of a fitted curve by significant amounts. Also (perhaps most importantly), treatment-level curve-fitting is blind to intra-session patterns. Thus it is valuable to also look at LCs that are model-free, i.e. non-parametric, and can account flexibly for intra-session variability. We develop non-parametric LCs to provide clear and easy-to-interpret models of learning that are easy to adjust to diverse theories (i.e. cosine similarity).

  • RQ4 - what kind of learning? Is the skill acquisition theory supported by our novel non-parametric model?

B. mechanisms of TB v SMR protocols, normal/inverse/transfer modes of training, and other covariates

When the best fitting learning model(s) are established, we can use them to explore the real complexity of the CENT NF data, including TB and SMR protocols, and normal, inverse, transfer training modes. Each of the protocols have theoretical grounding in cortical arousal and motoric activation regulation. Thus, we can also relate these sub-groups to the baseline bandpowers per session, and the pre-test baseline vigilance analysis and per-session sleep self-reports.

B: Research Questions

Learning in protocols

  • RQ5 - compare TB and SMR protocols on best-fit learning model(s)
    • Method - graphical and statistical comparison of TB vs SMR
    • Test - test distribution difference for scalar features or complete learning curve time series
      • Kolmogorov-Smirnov test for scalar features, e.g. cosine similarity
      • Maximum-Width Envelope test for complete learning curves / time-series
    • TODO - pending part A, apply best-fit learning model and derive features to test

Learning in training modes

  • RQ6 - compare normal, inverse, transfer (and combinations?) training modes on best-fit learning model(s)
    • Method - graphical and statistical comparison of training modes
    • Test - test distribution difference for scalar features or complete learning curve time series
      • Kolmogorov-Smirnov test for scalar features, e.g. cosine similarity
      • Maximum-Width Envelope test for complete learning curves / time-series
    • TODO - pending part A, apply best-fit learning model and derive features to test

Learning related to session-wise baseline bandpowers

  • RQ7 - learning model(s) moderate or are moderated by the baseline bandpowers
    • Method -
    • Test -
    • TODO -

Learning related to vigilance baseline and sleep self-reports

  • RQ8a - learning model(s) are moderated by baseline vigilance classification from EEG
    • Method -
    • Test -
    • TODO -
  • RQ8b - learning model(s) are moderated by, or moderate, sleep self-reports
    • Method -
    • Test -
    • TODO -

Summary

Our analysis approach has two parts: (A) investigate models of learning in NF data to find best-fit LCs; and (B) use the derived LCs to discover group-wise patterns in the various background and outcome variables available, in a within-subjects manner.

In the rest of the report, we will step through the methods for creating all LCs, to see how they work. We will then explore the relationships with background and outcome variables. First, we describe the data used.

LC Analysis report - Data Preparation

We work primarily from the following datasets (available in shared Dropbox folder or on request):

The raw data is saved as ‘tr.raw’. To create a clean dataset ‘tr.blk’, we filter trials to remove the first session (as it was a training session), session 41 (completed by only 1 patient), trials with score = 0, and trials marked bad by trainers.

## [1] "Data: tr.raw$adj_score"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   -2.25   10.17   15.59   16.68   22.25   87.24 
## [1] "... subset:  TB"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   -1.20   12.29   19.93   20.34   27.26   87.24 
## [1] "... subset:  SMR"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   -2.25    9.53   14.22   14.88   19.52   68.90
## [1] "Data: tr.blk$adj_score"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   12.63   17.06   19.01   23.48   72.96 
## [1] "... subset:  TB"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.16   16.95   22.00   23.70   29.09   72.96 
## [1] "... subset:  SMR"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   11.71   15.37   16.77   20.10   68.90

Training modes

We also subset scores according to the three training modes (normal, inverse, transfer), and the complement set (normal+transfer = not inverse). So there are five possible (clean) datasets:

  • all trials
  • normal trials
  • not-inverse trials
  • inverse trials
  • transfer trials

To proceed with Part A, we consider only not-inverse trials.

## [1] "Data: tr.not0$adj_score"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   12.26   16.16   18.04   21.45   72.96 
## [1] "... subset:  TB"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.73   17.99   22.91   24.97   30.16   72.96 
## [1] "... subset:  SMR"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   11.14   14.30   14.74   17.60   68.90

Filtering for correlation calculations

We come across our first significant problem. For calculating trial-wise correlations, we require 2 or more trials per session. Many sessions contain only one trial of a certain type, especially for transfer trials. We therefore create datasets that prune out the sessions with trials = 1. We also face a constraint when we calculate the cosine similarity of trial-wise correlations with a hypothetical multi-phase learning curve: the number of sessions should be enough to accommodate the definition of the hypothetical curve (e.g. 3+ for Fitts’ model).

## [1] "Data: tr.not$adj_score"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   12.26   16.16   18.04   21.45   72.96 
## [1] "... subset:  TB"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.73   17.99   22.87   24.96   30.16   72.96 
## [1] "... subset:  SMR"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   11.14   14.31   14.74   17.62   68.90

Data description methods

We try to describe two different aspects of learning on a per-session basis: (1) Magnitude - derived as score mean * We use outlier-resistant geometric mean of trial-wise scores per session. For comparison here, I also show the median-derived session-wise scores. Both might be fine for calculating session LCs because both are robust to outliers. However for NFB, because we can’t assume any model for performance because we don’t theoretically know how it happens, we don’t want to totally reject outliers: they might represent something important. Thus, for this report we use the geometric mean. We center and scale the mean to lie from -1 to 1. (2) Consistency - derived as score monotonicity * We use rank-order correlation of per-trial adjusted score with trial-order. We use Kendall rather than Spearman for rank correlation as it is recommended for low N - the interpretation remains the same. The outcome range is -1 to 1, where monotonic increase in score per session results in Kendall t=1, and monotonic decrease results in Kendall t=-1. Presumably, monotonic increase in performance scores is a positive sign for learning.

Group-descriptive stats and plots per subject follow below:

## [1] "GEOMETRIC MEAN ADJ SCORE:"
##    vars  n  mean  sd median trimmed  mad   min   max range  skew kurtosis
## X1    1 39 17.87 2.2  17.75   17.92 2.48 13.59 22.45  8.86 -0.12    -0.87
##      se
## X1 0.35
## [1] "Data: sn.not$adj_score"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.89   12.82   16.14   18.31   21.34   55.64 
## [1] "... subset:  TB"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    6.10   19.25   23.01   25.30   30.04   55.64 
## [1] "... subset:  SMR"
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.89   11.69   14.27   14.60   16.79   35.83
## [1] "KENDALL CORRELATIONS:"
##    vars  n mean   sd median trimmed  mad  min  max range skew kurtosis
## X1    1 39 0.11 0.14   0.12    0.11 0.11 -0.2 0.43  0.63 0.33    -0.07
##      se
## X1 0.02

LC Analysis report - Part A - Methods + Results

RQ1 - overall performance magnitude

  • Method - linear model of all trials: use R to replicate Edua’s thesis, test model fit
  • Test - significance of slope (‘growth curve’)

We will make a series of linear mixed-effects models: * Unconditioned: no factor for sessions, this simply models the intercept per subject * Fixed linear, random intercept: we allow the intercept to be random but fix a single group-wide slope * Random linear, random intercept: we allow the slope to be random per subject

The idea is that we can measure the improvement in model-fit as we make the models more realistic, and visualise the fit in terms of rediduals. Thus, below we see summaries for each fitted model, and two plots. First, linear models are plotted for each subject (black), overlaid by the ‘prototype’ function for the model, i.e. Intercept + βx. Next, the residuals are plotted, showing how much variance is not accounted for by the model.

## Linear mixed-effects model fit by REML
##  Data: scorelong 
##       AIC      BIC   logLik
##   5378.62 5392.834 -2686.31
## 
## Random effects:
##  Formula: ~1 | patient
##         (Intercept) Residual
## StdDev:    5.792983 5.536973
## 
## Fixed effects: adj_score ~ 1 
##                Value Std.Error  DF  t-value p-value
## (Intercept) 17.95138   1.22288 822 14.67959       0
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -3.13120599 -0.55418306 -0.03893965  0.48498069  5.06274016 
## 
## Number of Observations: 845
## Number of Groups: 23

## Linear mixed-effects model fit by REML
##  Data: scorelong 
##        AIC      BIC    logLik
##   5306.978 5325.925 -2649.489
## 
## Random effects:
##  Formula: ~1 | patient
##         (Intercept) Residual
## StdDev:    5.830495 5.276632
## 
## Fixed effects: adj_score ~ 1 + session 
##                 Value Std.Error  DF   t-value p-value
## (Intercept) 15.002825 1.2707104 821 11.806644       0
## session      0.156259 0.0170609 821  9.158915       0
##  Correlation: 
##         (Intr)
## session -0.253
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -3.04661902 -0.60865209 -0.05272864  0.49917681  4.95880851 
## 
## Number of Observations: 845
## Number of Groups: 23

## Linear mixed-effects model fit by REML
##  Data: scorelong 
##        AIC      BIC    logLik
##   5257.384 5285.806 -2622.692
## 
## Random effects:
##  Formula: ~1 + session | patient
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev    Corr  
## (Intercept) 4.2472191 (Intr)
## session     0.1514109 0.341 
## Residual    5.0373681       
## 
## Fixed effects: adj_score ~ 1 + session 
##                 Value Std.Error  DF   t-value p-value
## (Intercept) 14.933067 0.9536819 821 15.658331       0
## session      0.161691 0.0355815 821  4.544263       0
##  Correlation: 
##         (Intr)
## session 0.132 
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -3.56870595 -0.56502094 -0.05054914  0.48115477  5.21577503 
## 
## Number of Observations: 845
## Number of Groups: 23

RQ 1 testing

We can then test the various models by ANOVA, to check the significance of model fit differences.

##           Model df      AIC      BIC    logLik   Test  L.Ratio p-value
## um.fit        1  3 5378.620 5392.834 -2686.310                        
## fl.ri.fit     2  4 5306.978 5325.925 -2649.489 1 vs 2 73.64206  <.0001
##           Model df      AIC      BIC    logLik   Test  L.Ratio p-value
## fl.ri.fit     1  4 5306.978 5325.925 -2649.489                        
## rl.ri.fit     2  6 5257.384 5285.806 -2622.692 1 vs 2 53.59343  <.0001

RQ2 - plateau of performance improvement reached?

  • Method - quadratic model of all trials, fit quadratics and estimate proportion of positive vs negative signs
  • Test - check sign of each quadratic curve, positive = U-shaped with no plateau, negative = ∩-shaped with plateau

First, we will visualise this concept with a quadratic function fitted to data for each subject: y = β.session² + β.session + E

The plot is sorted from top left by protocol and gender. The second-order coefficient of this quadratic function, i.e. β.session², expresses the concept of plateau in the data in its sign. The sign is positive if the curve is concave (bending up at the ends, u-shaped), or negative if convex (bending down at the ends, n-shaped). A convex curve has a plateau, a concave doesn’t.

RQ2 - testing 1

We can compare the 2nd-order coefficient sign of each subject with their session-coefficient from the random slopes and intercepts growth model above, which captures the degree of learning. This is done by Pearson correlation, reported below.

## 
##  Pearson's product-moment correlation
## 
## data:  LINEAR_LEARNING_COEF and QUADRATIC_SIGN
## t = -2.2365, df = 21, p-value = 0.0363
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.72053273 -0.03221834
## sample estimates:
##        cor 
## -0.4385958

RQ2 - quadratic growth model

Next, we extend the growth modelling approach by adding a fixed effect for the square of the session. We visualise this model in the same way as before.

## Linear mixed-effects model fit by REML
##  Data: scorelong 
##        AIC      BIC    logLik
##   5258.891 5292.042 -2622.446
## 
## Random effects:
##  Formula: ~1 + session | patient
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev    Corr  
## (Intercept) 4.2710967 (Intr)
## session     0.1489576 0.341 
## Residual    5.0060135       
## 
## Fixed effects: adj_score ~ 1 + session + I(session^2) 
##                  Value Std.Error  DF   t-value p-value
## (Intercept)  13.531318 1.0429672 820 12.973867   0e+00
## session       0.377939 0.0727940 820  5.191905   0e+00
## I(session^2) -0.005719 0.0016867 820 -3.390758   7e-04
##  Correlation: 
##              (Intr) sessin
## session      -0.289       
## I(session^2)  0.396 -0.876
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -3.43071866 -0.57358013 -0.06376315  0.47217879  5.27498571 
## 
## Number of Observations: 845
## Number of Groups: 23

RQ2 - testing 2

We can also test the quadratic fit growth model against the earlier linear model using ANOVA.

##           Model df      AIC      BIC    logLik   Test   L.Ratio p-value
## rl.ri.fit     1  6 5257.384 5285.806 -2622.692                         
## rq.ri.fit     2  7 5258.891 5292.042 -2622.446 1 vs 2 0.4929621  0.4826

RQ3 - what kind of learning is seen, in terms of curve families, e.g. power law or exponential?

  • Method - fit curves from separate families to data, test/compare model fit
    • power law: linear in log-log space
    • exponential: linear in semi-log space
  • Tests - transform curves to a linear space to test fit with r²

We can view the outcome of fitting a power law curve to our data by examining a linear fit in the log-log transformation space: i.e. the log-transform of both data dimensions (score & session). We can further fit a linear growth model to log-log data to find the quality of fit of a power law. For the exponential curve, we simply repeat the process in log-linear space, i.e. log transform score but leave session as is.

## Linear mixed-effects model fit by REML
##  Data: scorelong 
##        AIC      BIC    logLik
##   485.7556 514.1774 -236.8778
## 
## Random effects:
##  Formula: ~1 + log(session) | patient
##  Structure: General positive-definite, Log-Cholesky parametrization
##              StdDev     Corr  
## (Intercept)  0.36505767 (Intr)
## log(session) 0.08067801 -0.548
## Residual     0.29884820       
## 
## Fixed effects: log(adj_score) ~ 1 + log(session) 
##                  Value  Std.Error  DF   t-value p-value
## (Intercept)  2.4680460 0.08328934 821 29.632194       0
## log(session) 0.1224252 0.02068483 821  5.918595       0
##  Correlation: 
##              (Intr)
## log(session) -0.632
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -7.27376238 -0.48760915  0.05495429  0.60067069  3.63167906 
## 
## Number of Observations: 845
## Number of Groups: 23

## Linear mixed-effects model fit by REML
##  Data: scorelong 
##       AIC      BIC    logLik
##   490.263 518.6848 -239.1315
## 
## Random effects:
##  Formula: ~1 + session | patient
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev      Corr  
## (Intercept) 0.325712158 (Intr)
## session     0.007295471 -0.346
## Residual    0.298101414       
## 
## Fixed effects: log(adj_score) ~ 1 + session 
##                Value  Std.Error  DF  t-value p-value
## (Intercept) 2.614301 0.07107060 821 36.78456       0
## session     0.009673 0.00180475 821  5.35972       0
##  Correlation: 
##         (Intr)
## session -0.417
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -7.15317963 -0.49192923  0.07123171  0.58319751  3.60329584 
## 
## Number of Observations: 845
## Number of Groups: 23

RQ3 - testing

We then compare the fit quality of these models; it doesn’t make sense using ANOVA, because the fixed effects change, but we can observe the differences in model fit indices AIC or BIC. In the same way, we can also compare the best fitting curve to the best linear model, above.

## Random slope+intercept linear model: 
##       sigma    logLik      AIC      BIC deviance 
##  1 5.037368 -2622.692 5257.384 5285.806       NA
## Random slope+intercept log-log model: 
##        sigma    logLik      AIC      BIC deviance 
##  1 0.2988482 -236.8778 485.7556 514.1774       NA
## Random slope+intercept log-linear model: 
##        sigma    logLik     AIC      BIC deviance 
##  1 0.2981014 -239.1315 490.263 518.6848       NA

RQ4 - Is the skill acquisition theory supported by our novel non-parametric model?

  • Method - fit the session-wise magnitude scores (geometric mean) and consistency index (Kendall correlation) to models of possible learning trajectories: monotonic, Fitts-Posner; with following fitting methods:
    • Kendall correlation: across-sessions correlation
    • Cosine similarity: across-sessions custom profile of scores/change indices.
  • Tests - test significance of distribution and model fit
    • difference of cosine value distribution mean from 0
    • direct estimate of fit from Kendall correlation/cosine similarity

Given LCs based on the two types of session-performance index, magnitude and consistency, we want to establish if they display a pattern that matches skill acquisition theory. The hypothetical skill acquisition learning curve follows the Fitts-Posner three stage model (REF - see Edua’s theory text 07.06.2017).

We fit our data to this model by taking the cosine similarity of each subject’s LC with a canonical LC that represents Fitts’ model: this is our ideal LC. We can test if the resulting distribution mean differs from 0 for the group, in order to determine if this model captures the learning that we know has occured.

We also want a comparison model, to determine whether our Fitts model fits the data better than some simpler explanation. Cosine similarity ranges from 1 to -1, as does correlation. Thus we can use Kendall correlation across sessions for score magnitude, to derive a monotonic LC. We can do similarly for consistency, by aggregating the per-session correlations to derive a consistent LC (using Hunter-Schmidt method, see Zhang & Wang (2014), Multivariate Behavioral Research, 49:2, 130-148).

Fitts’ model has three phases. Our choice of Fitts’ model is initially (0, 1, 0.5). The choice of model impacts the testing outcomes very strongly and will have to be explored in more detail later.

## [1] "Fitt's model for monotone improvement =  0, 1, 0.5"
## [1] "Fitt's model for geometric mean score =  -0.5, 0, 0.5"

“MonoLC.not” = monotonic LC (correlation across session means) for NOT-INVERSE trials data:

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -0.18413  0.03821  0.25042  0.19629  0.38310  0.47368

“ConsLC.not” = consistency LC (aggregate of session-wise correlations) for NOT-INVERSE trials data:

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -0.11473 -0.01019  0.10126  0.09519  0.18037  0.31624

“IscrLC.not” = per-session geometric mean score cosine similarity LC for NOT-INVERSE trials data:

##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## -0.100746 -0.004049  0.061513  0.054777  0.129322  0.182141

“IcorLC.not” = per-session correlations cosine similarity LC for NOT-INVERSE trials data:

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## -0.29381  0.05013  0.17940  0.16393  0.36723  0.48241

RQ4 - Examine LCs by plotting

It is useful to look at the LC data in raw form. The consistency, monotonic and ideal LCs are real-valued, have the same range [-1..1], and similar interpretation (-1..1 = distance from perfect performance according to the model being used). Thus, we examine the four LCs side-by-side in the same plot.

## No id variables; using all as measure variables

We can also sort the main LCs and view scatterplots to get a sense of how clustered the results are.

We can explore how the LCs relate to each other using a correlation matrix plot. Variables with their ranges lie on the diagonal. Over the diagonal are correlations of variable-pairs, and confidence intervals (in parentheses). Under the diagonal are loess-curve fits to the scatter plots of variable-pairs.

RQ4 - Testing LCs

First, we want to know whether each LC distribution was significantly biased, i.e. does the LC represent a consistent pattern of change across the group (do they improve)?

The monotonic and consistency LCs are based on calculations of Kendall’s tau. Under the null hypothesis of independence of X and Y, the sampling distribution of tau has an expected value of zero. Thus, group-wise null hypothesis can be that the distribution of LCs has a mean value close to zero. Because we have a small sample, we will not assume that the LC has a normal distribution, but instead use Wilcoxon’s non-parametric test.

Similarly using Wilcoxon’s non-parametric test, we also test whether the group-wise Ideal LC distribution was significantly biased, i.e. do they learn a skill?

## 
##  Wilcoxon signed rank test
## 
## data:  MonoLC.not[, 2]
## V = 247, p-value = 0.0004079
## alternative hypothesis: true location is not equal to 0
## 95 percent confidence interval:
##  0.1047619 0.3114846
## sample estimates:
## (pseudo)median 
##      0.2075842
## 
##  Wilcoxon signed rank test
## 
## data:  ConsLC.not[, 2]
## V = 234, p-value = 0.002416
## alternative hypothesis: true location is not equal to 0
## 95 percent confidence interval:
##  0.0373236 0.1514861
## sample estimates:
## (pseudo)median 
##     0.09796444
## 
##  Wilcoxon signed rank test
## 
## data:  IscrLC.not[, 2]
## V = 225, p-value = 0.006711
## alternative hypothesis: true location is not equal to 0
## 95 percent confidence interval:
##  0.01751030 0.09174911
## sample estimates:
## (pseudo)median 
##     0.05616349
## 
##  Wilcoxon signed rank test
## 
## data:  IcorLC.not[, 2]
## V = 227, p-value = 0.005414
## alternative hypothesis: true location is not equal to 0
## 95 percent confidence interval:
##  0.06977147 0.27633928
## sample estimates:
## (pseudo)median 
##       0.168556

Finally, the idea of including monotonic and consistency LCs is that they provide a simple base case to compare the Ideal LCs. We do this by comparing how well they fit the data. Since all LCs share the same distribution ranges, they can all be subjected to the same logic: i.e. a perfect relation between model and data would give a score of ±1, and an orthogonal relation gives 0. Thus the model fit can be calculated simply by Σ₁ⁿ 1 - |xₙ| / n, i.e. the average difference of absolute values from 1. As usual with fit indices, lower is better!

## MonoLC model fit:  0.7472654
## ConsLC model fit:  0.8714739
## IdealLC.score model fit:  0.9196877
## IdealLC.correl model fit:  0.7543331

Part A - DISCUSSION

RQ1

For RQ1, we see that the random slopes and intercepts model has the best fit, significant by ANOVA at p<.0001. The group-level coefficient of session is ~0.16, giving a total raise of ~6.08 over the measured 38 sessions, significant at p<0.00001.

RQ2

For RQ2, we see that most subjects (n=15) have a plateau, i.e. quadratic is convex. The minority with no plateau (concave curves, n=8) is big enough to be meainingful though. The Pearson’s correlation between them is r=-0.44, significant at p<0.05. Since negative signs indicate convex/plateaued curves, the negative correlation indicates that subjects who learned more had a plateau.

Interestingly, the quadratic growth model doesn’t capture the concavity of some data (all individual curves are convex in the plot); it also doesn’t fit the data any better than the linear model. Thus it is probably the wrong way to approach this question.

RQ3

The subject-wise plots show that a power law AND an exponential curve fit very well to the data (data are almost linear in the transform space), for quite a few subjects, but the fit is not good for a substantial minority. The comparison of fitting indices shows that the two curve families are relatively equal: power law/log-log is slightly better than exponential/log-linear. However both are an order of magnitude better than the linear model, showing that even if these curves do not fit perfectly for everyone (motivating the non-parametric approach), the majority pattern is that subjects learn in a classic ‘power-law’ way.

RQ4

There quite a few results here: * The boxplot and Wilcoxon tests show that all LCs capture some kind of positive relationship: all are significantly different to zero. * The correlation matrix indicates that LCs that are based on the same data (session-wise correlations or mean scores) are quite highly correlated (>0.7). * The scatter plots show that each LC is distributed quite uniformly across the range: no strong clustering. * The fit values are all quite poor, but especially for ConsLC (0.87) and IdealLC.score (0.92). * Finally, the comparison of LCs by fit indicates: (a) IdealLC.score is a poorly chosen model compared with MonoLC; and (b) IdealLC.correl improves over the base model ConsLC.

Further work?

RQ1-3 seem like good and complete analyses, from which insights and reporting can be drawn. RQ4 still seems to lack a major insight: it would be great to try improving the model fitting scores by optimisation; however this might not be possible in the near future.

LC Analysis report - Part B - Methods + Results

Part B - Data preparation

TODO: describe data for Part B RQs

For each dataset we create a new index of sessions so that rows can be aligned according to how many sessions of a protocol were conducted, as opposed to what the session number was when starting. Thus, e.g. transfer trials are indexed from 1 to 10 for all subjects, regardless of what session it was that their transfer trials really started (first transfer trial ranged from session 28 to session 35, depending on subject).

However, pruning out sessions with 1 trial results in losing quite a lot of data.

The total number of sessions per subject for each training mode now varies quite a bit: this is no problem because our LC calculation methods are not sensitive to small differences in N, except when N is very small (see more below). However, for the transfer training mode, removing sessions with 1 trial results in losing entire subjects. For this reason, because parametric or session-wise LCs are measuring a different thing to trial-wise correlation-based LCs, we can use different datasets for each: for session LCs, we will include sessions with 1 trial.

We can further subdivide data by subjects in TB protocol and subjects in SMR protocol…